Quarter Review

PS312 Statistical Research Methods

Review of the Quarter

Substantive

  • Causality

  • Mechanisms

  • Confounders

Technical

  • Statistical Tests

  • Regressions

  • Diagnostics

Distributions (I)

set.seed(123)         # set seed for reproducibility

norm_d = rnorm(10000) # generate random observations

ggplot() +
  geom_histogram(aes(x = norm_d)) +
  labs(x = NULL,
       y = NULL) +
  theme_bw()

Distributions (II)

ggplot() +
  geom_boxplot(aes(x = norm_d)) +
  labs(x = NULL,
       y = NULL) +
  theme_bw()

Distribution Comparisons


    Welch Two Sample t-test

data:  norm_x and norm_y
t = -55.188, df = 17450, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.059542 -1.918262
sample estimates:
mean of x mean of y 
0.9928849 2.9817871 
set.seed(123)

norm_x = rnorm(n = 10000, mean = 1, sd = 3)
norm_y = rnorm(n = 10000, mean = 3, sd = 2)

ggplot() +
  geom_histogram(aes(x = norm_x, fill = "Distribution X"), alpha = 0.5) +
  geom_histogram(aes(x = norm_y, fill = "Distribution Y"), alpha = 0.5) +
  geom_vline(xintercept = mean(norm_x), color = "red") +
  geom_vline(xintercept = mean(norm_y), color = "blue") +
  labs(x = NULL,
       y = NULL,
       fill = NULL) +
  theme_bw()

Data Merging

df_x = data.frame(ID = c("1", "2", "3", "4"), 
                X = c(34, 22, 19, 85))

df_y = data.frame(ID = c("1", "2", "4", "4"), 
                Y = c("Blue", "Red", "Green", "Yellow"))
ID X
1 34
2 22
3 19
4 85
ID Y
1 Blue
2 Red
4 Green
4 Yellow

Data Merging

df_x %>% 
  left_join(df_y, by = "ID") 
ID X
1 34
2 22
3 19
4 85
ID Y
1 Blue
2 Red
4 Green
4 Yellow
ID X Y
1 34 NA
2 22 Red
3 19 NA
4 85 Green
4 85 Yellow

Libraries and Functions (I)

Library Functions Description
tidyverse filter(), mutate(), ggplot() data wrangling and visualization
modelsummary modelsummary() present good looking tables
ggeffects ggpredict() calculate and visualize marginal effects

Libraries and Functions (II)

Library Functions Description
GGally ggpairs(), ggcoef() extension to ggplot
ggfortify autoplot() extension to ggplot for diagnostics
lmtest bptest() statistical tests for diagnostics
car vif() additional statistical tests for diagnostics